2023
Conference article
Open Access
Trustworthy AI at KDD Lab
Giannotti F., Guidotti R., Monreale A., Pappalardo L., Pedreschi D., Pellungrini R., Pratesi F., Rinzivillo S., Ruggieri S., Setzu M., Deluca R.This document summarizes the activities regarding the development of Responsible AI (Responsible Artificial Intelligence) conducted by the Knowledge Discovery and Data mining group (KDD-Lab), a joint research group of the Institute of Information Science and Technologies "Alessandro Faedo" (ISTI) of the National Research Council of Italy (CNR), the Department of Computer Science of the University of Pisa, and the Scuola Normale Superiore of Pisa.Source: Ital-IA 2023, pp. 388–393, Pisa, Italy, 29-30/05/2023
Project(s): SoBigData-PlusPlus
See at:
ceur-ws.org | ISTI Repository | CNR ExploRA
2022
Report
Open Access
Mobility data mining: from technical to ethical (Dagstuhl Seminar 22022)
Berendt B., Matwin S., Renso C., Meissner F., Pratesi F., Raffaeta A., Rockwell G.This report documents the program and the outcomes of Dagstuhl Seminar 22022 "Mobility Data Analysis: from Technical to Ethical" that took place fully remote and hosted by Schloss Dagstuhl from 10-12 January 2022. An interdisciplinary team of 23 researchers from Europe, the Americas and Asia in the fields of computer science, ethics and mobility analysis discussed interactions between their topics and fields to bridge the gap between the more technical aspects to the ethics with the objective of laying the foundations of a new Mobility Data Ethics research field.Source: ISTI Research report, pp.35–66, 2022
DOI: 10.4230/dagrep.12.1.35Project(s): MASTER Metrics:
See at:
ISTI Repository | doi.org | CNR ExploRA
2022
Other
Unknown
The TAILOR Handbook of Trustworthy AI
Albertoni R., Allard T., Alves G., Bringas Colmenarejo A., Buijsman S., Casares P. A M, Colantonio S., Couceiro M., Escobar S., Gonzalez-Castañé G., Guidotti R., Heintz F., Hernandez Orallo J., Kuilman S., Makhlouf K., Martinez Plumed F., Monreale A., Pellungrini R., Pratesi F., Ramachandran Pillai R., Rossi A., Rousset M. C., Ruggieri S., Siebert L. C., Skrzypczyski P., Stefanowski J., Straccia U., Òsullivan B., Visentin A., Zgonnikov A., Zhioua S.The main goal of the Handbook of Trustworthy AI is to provide to non experts, especially researchers and students, an overview of the problem related to the developing of ethical and trustworty AI systems. In particular, we want to provide an overview of the main dimensions of trustworthiness, starting with a understandable explaination of the dimension itsleves, and then presenting the characterization of the problems (staring with a brief summary and the explaination of the importance of the dimension, presenting a taxonomy and some guidelines, if they are available and consolidated), summarizing what are the major challenges and solutions in the field, as well as what are the lastest research developments.Project(s): TAILOR
See at:
CNR ExploRA | tailor.isti.cnr.it
2022
Journal article
Open Access
Where do migrants and natives belong in a community: a Twitter case study and privacy risk analysis
Kim J., Pratesi F., Rossetti G., Sîrbu A., Giannotti F.Today, many users are actively using Twitter to express their opinions and to share information. Thanks to the availability of the data, researchers have studied behaviours and social networks of these users. International migration studies have also benefited from this social media platform to improve migration statistics. Although diverse types of social networks have been studied so far on Twitter, social networks of migrants and natives have not been studied before. This paper aims to fill this gap by studying characteristics and behaviours of migrants and natives on Twitter. To do so, we perform a general assessment of features including profiles and tweets, and an extensive network analysis on the network. We find that migrants have more followers than friends. They have also tweeted more despite that both of the groups have similar account ages.
More interestingly, the assortativity scores showed that users tend to connect based on nationality more than country of residence, and this is more the case for migrants than natives. Furthermore, both natives and migrants tend to connect mostly with natives. The homophilic behaviours of users are also well reflected in the communities that we detected. Our additional privacy risk analysis showed that Twitter data can be safely used without exposing sensitive information of the users, and minimise risk of re-identification, while respecting GDPR.Source: Social Network Analysis and Mining 13 (2022). doi:10.1007/s13278-022-01017-0
DOI: 10.1007/s13278-022-01017-0Project(s): SoBigData-PlusPlus Metrics:
See at:
link.springer.com | ISTI Repository | CNR ExploRA
2022
Contribution to conference
Open Access
IAIL 2022 - Imagining the AI Landscape after the AI Act
Dushi D., Naretto F., Panigutti C., Pratesi F.We summarize the first Workshop on Imagining the AI Landscape after the AI Act (IAIL 2022), co-located with 1st International Conference on Hybrid Human-Artificial Intelligence (HHAI 2022), held on June 13, 2022 in Amsterdam, Netherlands.Project(s): CoHuBiCoL ,
TAILOR ,
HumanE-AI-Net ,
SoBigData-PlusPlus
See at:
ceur-ws.org | ISTI Repository | CNR ExploRA
2022
Contribution to book
Open Access
Ethics in smart information systems
Pratesi F., Trasarti R., Giannotti F.This chapter analyses some of the ethical implications of recent developments in artificial intelligence (AI), data mining, machine learning and robotics. In particular, we start summarising the more consolidated issues and solutions related to privacy in data management systems, moving towards the novel concept of explainability. The chapter reviews the development of the right to privacy and the right to explanation, culminated in the General Data Protection Regulation. However, the new kinds of big data (such as internet logs or GPS tracking) require a different approach to managing privacy requirements. Several solutions have been developed and will be reviewed here. Our view is that generally data protection must be considered from the beginning as novel AI solutions are developing using the Privacy-by-Design paradigm. This involves a shift in perspective away from remedying problems to trying to prevent them, instead. We conclude by covering the main requirements necessary to achieve a trustworthy scenario, as advised also by the European Commission. A step in the direction towards Trustworthy AI was achieved in the Ethics Guidelines for Trustworthy Artificial Intelligence produced by an expert group for the European Commission. The key elements in these guidelines will reviewed in this chapter. To ensure European independence and leadership, we must invest wisely by bundling, connecting and opening our AI resources while also having in mind ethical priorities, such as transparency and fairness.Source: Ethical evidence and policymaking. Interdisciplinary and international research, edited by Iphofen R., O'Mathúna D., pp. 162–184, 2022
DOI: 10.51952/9781447363972.ch009DOI: 10.56687/9781447363972-012DOI: 10.2307/j.ctv2tbwqd5.14Project(s): TAILOR ,
PRO-RES ,
SoBigData-PlusPlus Metrics:
See at:
bristoluniversitypressdigital.com | doi.org | doi.org | ISTI Repository | doi.org | CNR ExploRA
2021
Journal article
Open Access
Give more data, awareness and control to individual citizens, and they will help COVID-19 containment
Nanni M., Andrienko G., Barabasi A. -L., Boldrini C., Bonchi F., Cattuto C., Chiaromonte F., Comande G., Conti M., Cote M., Dignum F., Dignum V., Domingo-Ferrer J., Ferragina P., Giannotti F., Guidotti R., Helbing D., Kaski K., Kertesz J., Lehmann S., Lepri B., Lukowicz P., Matwin S., Jimenez D. M., Monreale A., Morik K., Oliver N., Passarella A., Passerini A., Pedreschi D., Pentland A., Pianesi F., Pratesi F., Rinzivillo S., Ruggieri S., Siebes A., Torra V., Trasarti R., Hoven J., Vespignani A.The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the "phase 2" of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates--if and when they want and for specific aims--with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.Source: Ethics and information technology 23 (2021). doi:10.1007/s10676-020-09572-w
DOI: 10.1007/s10676-020-09572-wProject(s): SoBigData-PlusPlus Metrics:
See at:
Aaltodoc Publication Archive | Aaltodoc Publication Archive | Ethics and Information Technology | Ethics and Information Technology | Recolector de Ciencia Abierta, RECOLECTA | Archivio Istituzionale | link.springer.com | Ethics and Information Technology | City Research Online | ISTI Repository | Online Research Database In Technology | NARCIS | NARCIS | Digitala Vetenskapliga Arkivet - Academic Archive On-line | Publikationer från Umeå universitet | NARCIS | kclpure.kcl.ac.uk | Fraunhofer-ePrints | Fraunhofer-ePrints | publons.com | www.scopus.com | CNR ExploRA
2020
Journal article
Open Access
(So) Big Data and the transformation of the city
Andrienko G., Andrienko N., Boldrini C., Caldarelli G., Cintia P., Cresci S., Facchini A., Giannotti F., Gionis A., Guidotti R., Mathioudakis M., Muntean C. I., Pappalardo L., Pedreschi D., Pournaras E., Pratesi F., Tesconi M., Trasarti R.The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the "City of Citizens" thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.Source: International Journal of Data Science and Analytics (Print) 1 (2020). doi:10.1007/s41060-020-00207-3
DOI: 10.1007/s41060-020-00207-3Project(s): SoBigData Metrics:
See at:
Aaltodoc Publication Archive | International Journal of Data Science and Analytics | White Rose Research Online | HELDA - Digital Repository of the University of Helsinki | Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari | link.springer.com | International Journal of Data Science and Analytics | City Research Online | ISTI Repository | Fraunhofer-ePrints | CNR ExploRA
2020
Journal article
Open Access
An ethico-legal framework for social data science
Forgó N., Hänold S., Van Den Hoven J., Krügel T., Lishchuk I., Mahieu R., Monreale A., Pedreschi D., Pratesi F., Van Putten D.This paper presents a framework for research infrastructures enabling ethically sensitive and legally compliant data science
in Europe. Our goal is to describe how to design and implement an open platform for big data social science, including, in
particular, personal data. To this end, we discuss a number of infrastructural, organizational and methodological principles to
be developed for a concrete implementation. These include not only systematically tools and methodologies that effectively
enable both the empirical evaluation of the privacy risk and data transformations by using privacy-preserving approaches, but
also the development of training materials (a massive open online course) and organizational instruments based on legal and
ethical principles. This paper provides, by way of example, the implementation that was adopted within the context of the
SoBigData Research Infrastructure.Source: International Journal of Data Science and Analytics (Print) 11 (2020): 377–390. doi:10.1007/s41060-020-00211-7
DOI: 10.1007/s41060-020-00211-7Project(s): SoBigData ,
SoBigData-PlusPlus Metrics:
See at:
Vrije Universiteit Brussel Research Portal | ISTI Repository | NARCIS | CNR ExploRA
2019
Journal article
Open Access
PRIMULE: Privacy risk mitigation for user profiles
Pratesi F., Gabrielli L., Cintia P., Monreale A., Giannotti F.The availability of mobile phone data has encouraged the development of different data-driven tools, supporting social science studies and providing new data sources to the standard official statistics. However, this particular kind of data are subject to privacy concerns because they can enable the inference of personal and private information. In this paper, we address the privacy issues related to the sharing of user profiles, derived from mobile phone data, by proposing PRIMULE, a privacy risk mitigation strategy. Such a method relies on PRUDEnce (Pratesi et al., 2018), a privacy risk assessment framework that provides a methodology for systematically identifying risky-users in a set of data. An extensive experimentation on real-world data shows the effectiveness of PRIMULE strategy in terms of both quality of mobile user profiles and utility of these profiles for analytical services such as the Sociometer (Furletti et al., 2013), a data mining tool for city users classification.Source: Data & knowledge engineering 125 (2019). doi:10.1016/j.datak.2019.101786
DOI: 10.1016/j.datak.2019.101786Project(s): SoBigData Metrics:
See at:
ISTI Repository | Archivio istituzionale della Ricerca - Scuola Normale Superiore | Data & Knowledge Engineering | www.sciencedirect.com | CNR ExploRA
2018
Contribution to book
Open Access
How data mining and machine learning evolved from relational data base to data science
Amato G., Candela L., Castelli D., Esuli A., Falchi F., Gennaro C., Giannotti F., Monreale A., Nanni M., Pagano P., Pappalardo L., Pedreschi D., Pratesi F., Rabitti F., Rinzivillo S., Rossetti G., Ruggieri S., Sebastiani F., Tesconi M.During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today.Source: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, edited by Sergio Flesca, Sergio Greco, Elio Masciari, Domenico Saccà, pp. 287–306, 2018
DOI: 10.1007/978-3-319-61893-7_17Metrics:
See at:
arpi.unipi.it | ISTI Repository | doi.org | link.springer.com | CNR ExploRA
2018
Conference article
Open Access
Privacy Preserving Multidimensional Profiling
Pratesi F., Monreale A., Giannotti F., Pedreschi D.Recently, big data had become central in the analysis of human behavior and the development of innovative services. In particular, a new class of services is emerging, taking advantage of different sources of data, in order to consider the multiple aspects of human beings. Unfortunately, these data can lead to re-identification problems and other privacy leaks, as diffusely reported in both scientific literature and media. The risk is even more pressing if multiple sources of data are linked together since a potential adversary could know information related to each dataset. For this reason, it is necessary to evaluate accurately and mitigate the individual privacy risk before releasing personal data. In this paper, we propose a methodology for the first task, i.e., assessing privacy risk, in a multidimensional scenario, defining some possible privacy attacks and simulating them using real-world datasets.Source: 3rd International Conference on Smart Objects and Technologies for Social Good, GOODTECHS 2017, pp. 142–152, Pisa, Italy, 29-30/11/2017
DOI: 10.1007/978-3-319-76111-4_15Project(s): SoBigData Metrics:
See at:
link.springer.com | ISTI Repository | Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering | Archivio istituzionale della Ricerca - Scuola Normale Superiore | CNR ExploRA
2018
Journal article
Open Access
PRUDEnce: A system for assessing privacy risk vs utility in data sharing ecosystems
Pratesi F., Monreale A., Trasarti R., Giannotti F., Pedreschi D., Yanagihara T.Data describing human activities are an important source of knowledge useful for understanding individual and collective behavior and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people's whereabouts may allow re-identification of individuals in a de-identified database. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a trade-off. In this paper we propose PRUDEnce (Privacy Risk versus Utility in Data sharing Ecosystems), a system enabling a privacy-aware ecosystem for sharing personal data. It is based on a methodology for assessing both the empirical (not theoretical) privacy risk associated to users represented in the data, and the data quality guaranteed only with users not at risk. Our proposal is able to support the Data Provider in the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. We study the practical effectiveness of our proposal over three data formats underlying many services, defined on real mobility data, i.e., presence data, trajectory data and road segment data.Source: Transactions on data privacy 11 (2018): 139–167.
Project(s): SoBigData
See at:
ISTI Repository | www.tdp.cat | CNR ExploRA
2017
Contribution to book
Restricted
Assessing privacy risk in retail data
Pellungrini R., Pratesi F., Pappalardo L.Retail data are one of the most requested commodities by commercial companies. Unfortunately, from this data it is possible to retrieve highly sensitive information about individuals. Thus, there exists the need for accurate individual privacy risk evaluation. In this paper, we propose a methodology for assessing privacy risk in retail data. We define the data formats for representing retail data, the privacy framework for calculating privacy risk and some possible privacy attacks for this kind of data. We perform experiments in a real-world retail dataset, and show the distribution of privacy risk for the various attacks.Source: Personal Analytics and Privacy. An Individual and Collective Perspective, edited by Riccardo Guidotti, Anna Monreale, Dino Pedreschi, Serge Abiteboul, pp. 17–22, 2017
DOI: 10.1007/978-3-319-71970-2_3Project(s): SoBigData Metrics:
See at:
Lecture Notes in Computer Science | link.springer.com | CNR ExploRA
2017
Conference article
Restricted
Fast estimation of privacy risk in human mobility data
Pellungrini R., Pappalardo L., Pratesi F., Monreale A.Mobility data are an important proxy to understand the patterns of human movements, develop analytical services and design models for simulation and prediction of human dynamics. Unfortunately mobility data are also very sensitive, since they may contain personal information about the individuals involved. Existing frameworks for privacy risk assessment enable the data providers to quantify and mitigate privacy risks, but they suffer two main limitations: (i) they have a high computational complexity; (ii) the privacy risk must be re-computed for each new set of individuals, geographic areas or time windows. In this paper we explore a fast and flexible solution to estimate privacy risk in human mobility data, using predictive models to capture the relation between an individual's mobility patterns and her privacy risk. We show the effectiveness of our approach by experimentation on a real-world GPS dataset and provide a comparison with traditional methods.Source: SAFECOMP 2017 - International Conference on Computer Safety, Reliability, and Security, pp. 415–426, Trento, Italy, 12 September 2017
DOI: 10.1007/978-3-319-66284-8_35Project(s): SoBigData Metrics:
See at:
Lecture Notes in Computer Science | link.springer.com | CNR ExploRA
2016
Report
Unknown
PRISQUIT: a system for assessing privacy risk versus quality in data sharing
Pratesi F., Monreale A., Trasarti R., Giannotti F., Pedreschi D., Yanagihara T.Data describing human activities are an important source of knowledge useful for understanding individual and collective behavior and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people's whereabouts may allow re-identification of individuals in a de-identified database. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a trade-off. In this paper we propose PRISQUIT (Privacy RISk versus QUalITy), a system enabling a privacy-aware ecosystem for sharing personal data. It is based on a methodology for assessing both the empirical (not theoretical) privacy risk associated to users represented in the data, and the data quality guaranteed only with users not at risk. Our proposal is able to support the Data Provider in the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. We study the practical effectiveness of our proposal over three data formats underlying many services, defined on real mobility data, i.e., presence data, trajectory data and road segment data.Source: ISTI Technical reports, 2016
Project(s): SoBigData
See at:
CNR ExploRA
2015
Conference article
Restricted
Managing travels with PETRA: The Rome use case
Botea A., Braghin S., Lopes N., Guidotti R., Pratesi F.The aim of the PETRA project is to provide the basis for a city-wide transportation system that supports policies catering for both individual preferences of users and city-wide travel patterns. The PETRA platform will be initially deployed in the partner city of Rome, and later in Venice, and Tel-Aviv.Source: 31st IEEE International Conference on Data Engineering. Data Mining and Smart Cities Applications Workshop, pp. 110–111, Seoul, Korea, 13-17/04/2015
DOI: 10.1109/icdew.2015.7129558Project(s): PETRA Metrics:
See at:
doi.org | ieeexplore.ieee.org | CNR ExploRA
2015
Conference article
Restricted
Mobility Mining for Journey Planning in Rome
Berlingerio M., Bicer V., Botea A., Braghin S., Lopes N., Guidotti R., Pratesi F.We present recent results on integrating private car GPS routines obtained by a Data Mining module. into the PETRA (PErsonal TRansport Advisor) platform. The routines are used as additional "bus lines", available to provide a ride to travelers. We present the effects of querying the planner with and without the routines, which show how Data Mining may help Smarter Cities applications.Source: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. European Conference, pp. 222–226, Porto, Portugal, 07-11/09/2015
DOI: 10.1007/978-3-319-23461-8_18Project(s): PETRA Metrics:
See at:
doi.org | link.springer.com | CNR ExploRA